Metaphor Identification in Large Texts Corpora

نویسندگان

Yair Neuman

Dan Assaf

Yohai Cohen

Mark Last

Shlomo Argamon

Newton Howard

Ophir Frieder

چکیده

Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms' performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Ideational Grammatical Metaphor in Health Texts of English Newspapers

Systemic functional grammar constructs a grammar for the purpose of text analysis to investigate how grammar is used as a means of making meaning. Grammatical metaphor is one of the language phenomena introduced by Halliday (2004) in the framework of functional grammar. The present work focuses on the application of Halliday’s metafunctional framework in health texts of English newspapers. The ...

متن کامل

A Comparative Study of Ideational Grammatical Metaphor in Scientific and Political Texts

Language, science and politics go together and learning these genres is to learn a language created for codifying, extending and transmitting scientific and political knowledge. Grammatical metaphor is divided into two broad areas: ideational and interpersonal.This paper focuses on the first type i.e. Ideational Grammatical Metaphor (IGM), which includes process types and nominalization. The m...

متن کامل

Measuring Interlanguage: Native Language Identification with L1-influence Metrics

The task of native language (L1) identification suffers from a relative paucity of useful training corpora, and standard within-corpus evaluation is often problematic due to topic bias. In this paper, we introduce a method for L1 identification in second language (L2) texts that relies only on much more plentiful L1 data, rather than the L2 texts that are traditionally used for training. In par...

متن کامل

Syntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity

In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...

متن کامل

A lexicon of perception for the identification of synaesthetic metaphors in corpora

Synaesthesia is a type of metaphor associating linguistic expressions that refer to two different sensory modalities. Previous studies, based on the analysis of poetic texts, have shown that synaesthetic transfers tend to go from the lower toward the higher senses (e.g., sweet music vs. musical sweetness). In non-literary language synaesthesia is rare, and finding a sufficient number of example...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 8 شماره

صفحات -

تاریخ انتشار 2013

Metaphor Identification in Large Texts Corpora

نویسندگان

چکیده

منابع مشابه

A Study of Ideational Grammatical Metaphor in Health Texts of English Newspapers

A Comparative Study of Ideational Grammatical Metaphor in Scientific and Political Texts

Measuring Interlanguage: Native Language Identification with L1-influence Metrics

Syntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity

A lexicon of perception for the identification of synaesthetic metaphors in corpora

عنوان ژورنال:

اشتراک گذاری